Non-speech Environmental Sound Classification Using Svms with a New Set of Features
نویسندگان
چکیده
Mel Frequency Cepstrum Coefficients (MFCCs) are considered as a method of stationary/pseudo-stationary feature extraction. They work very well for the classification of speech and music signals. MFCCs have also been used to classify non-speech sounds for audio surveillance systems, even though MFCCs do not completely reflect the time-varying features of non-stationary non-speech signals. We introduce a new 2D-feature set, used with a feature extraction method based on the pitch range (PR) of non-speech sounds and the Autocorrelation Function. We compare the classification accuracies of the proposed features of this new method to MFCCs by using Support Vector Machines (SVMs) and Radial Basis Function Neural Network classifiers. Non-speech environmental sounds: gunshot, glass breaking, scream, dog barking, rain, engine, and restaurant noise, were studied. The new feature set provides high accuracy rates when used as a classifier. Its usage with MFCCs significantly improves the accuracy rates of the given classifiers in the range of 4% to 35% depending on the classifier used, suggesting that both feature sets are complementary. SVM classifier using the Gaussian kernel provided the highest accuracy rates among the classifiers used in this study.
منابع مشابه
Classification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملPhoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کاملA Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملAcoustic Scene Recognition with Deep Learning
Background. Sound complements visual inputs, and is an important modality for perceiving the environment. Increasingly, machines in various environments have the ability to hear, such as smartphones, autonomous robots, or security systems. This work applies state-of-the-art Deep Learning models that have revolutionized speech recognition to understanding general environmental sounds. Aim. This ...
متن کامل